Recall ARMA model and its backshift notation. We rewrite to and try to determine . We first write Factorize as Here are roots of . Then
Recall expansion in (2.4), if all , rewrite This is a causal stationary process.
Since PACF and ACF have not a clear cut off, we usually use AIC, BIC to determine .
2 Box-Jenkins Time Series Modeling Strategy
The idea is
Transform to that doesn't have any discernible trends.
Fit an model.
To implement the first idea, we usually have two ways:
Differencing: use , , etc.
Seasonal Differencing: , .
3 ARIMA Models
ARIMA (AutoRegressive Integrated Moving Average)
is if where .
3.1 Seasonal ARMA Models
We say is a seasonal with period , if , where
This is a special case of an model. But it has parameters (1 for ) while a general has .
The ACF and PACF of seasonal ARMA are non-zero only at seasonal lags . At seasonal lags, PACF and ACF behave just like unseasonal ARMA model: .
3.2 Multiplicative Seasonal ARMA Models
Multiplicative Seasonal ARMA Model
:
For a dataset that might have sample autocorrelations nonnegligile at lags (like co2 dataset), we can use this to reduce parameter.
4 SARIMA Models
SARIMA model
: Recall and .
5 Parameter Estimation in MA(1)
Estimating parameters of ARMA/ARIMA/SARIMA is much harder than AR models. We'll illustrate the difficulty using the example of . Recall from here, is given by
The joint density of is multivariate normal with mean and covariance matrix :
The likelihood is where . This is a function of , which can be estimated by maximizing the logarithm of the likelihood. makes this computationally expensive. We should use some approximation to skip inverse.
An alternative approach is to find a connection to AR models. We can convert to so that This requires . For this AR model, the likelihood is This involves for which we have no data. We can simply let them to be . Now it becomes where
The MLE of comes from
This is a nonlinear minimization that can be done in packages in Python like scipy. It's easy to see that .
For uncertainty quantification, we can take a Bayesian approach. First assume prior for a large . Note that we restrict .
The posterior is then To obtain the posterior of alone, we integrate the above w.r.t . Then we have
This can be evaluated numerically over a grid of and then approximated. Or, we can approximate with a suitable t distribution by doing a Taylor expansion of near . I.e., let and : where we used because is a minimizer, and is the Hessian of . Therefore
Comparing with for the p-variate t-density , we see that